We focus on singing technique conversion in this demo page.
The following are the audio samples of Fig. 2(b) in the paper.
The audio files below are all converted from Mel-spectrograms using Griffin-Lim.
Therefore, the audio "original Mel-spectrogram" are the upper bounds of the audio quality for each conversion.
Again, these samples are obtained by inverting from their Mel-spectrograms to audio.
Notice that singers could express a vocal technique distinctly, and also sing at different levels of expression at different time instants, which raises ambiguities on defining vocal techniques and poses challenges to data-driven models.